Multi-lingual testing of a self-learning approach to phonemic transcription of orthography
نویسندگان
چکیده
A Self-Learning system for Grapheme to Phoneme conversion is described and tested. The system acquires the knowledge needed for grapheme-to-phoneme conversion from a training session in which a large number of pairs of grapheme strings and their corresponding (manually verified) phonemic transcription strings are presented to the system. The result from the training is a stochastic decision tree in which statistics as given in the trainingmaterial about corresponding graphemes and phonemes are stored for later retrieval. The system is tested on a number of European languages and results from three tests are reported. In the first test, which concerns proper names, only themost probable phoneme candidate at each leaf of the tree is utilised. The second and the third test, both using a database of ordinary words, aims at analysing phoneme andword accuracies resulting from using N-Best phonemes at each leaf and from introducing phonotactic information, respectively. Using N-Best candidates in combination with phonotactic information show a phoneme and word accuracy of up to 88.5% and 46.6%, respectively.
منابع مشابه
The Effect of L1 Persian on the Acquisition of English L2 Orthographic System on the Shared Grounds
This paper elaborates on Persian and English orthographic shared aspects to study the effects of L1 Persian on learning English as a foreign language. While there are some examples of letter and sound mismatches in the orthographic system of both languages, those of English are more complex than Persian. In order to see the effect of the mismatch between orthography and transcription, 40 Persia...
متن کاملUtilizing a noisy-channel approach for Korean LVCSR
Korean is an agglutinative and highly inflective language with a severe phonological phenomenon and coarticulation effects, making the development of a large-vocabulary continuous speech recognition system (LVCSR) difficult. Choosing a Korean orthographic word-phrase (eojeol) as a basic recognition unit leads to high out-of-vocabulary (OOV) rates, whereas choosing an orthographic syllable (eumj...
متن کاملNeither Deep nor Shallow: A Classroom Experiment Testing the Orthographic Depth of Tone Marking in Kabiye (Togo).
The experiment reported here tests the Lexical Orthography Hypothesis, that is, the notion that the output of the lexical phonology is the most promising phonological depth for an exhaustive representation of tone by means of diacritics in the orthography of atone language. We conducted a controlled classroom experiment with 97 secondary school pupils learning written Kabiye, a Gur language of ...
متن کاملLearning a Lexicon and Translation Model from Phoneme Lattices
Language documentation begins by gathering speech. Manual or automatic transcription at the word level is typically not possible because of the absence of an orthography or prior lexicon, and though manual phonemic transcription is possible, it is prohibitively slow. On the other hand, translations of the minority language into a major language are more easily acquired. We propose a method to h...
متن کاملThe Effect of Transcribing on Beginning Learners’ Phonemic Perception
A large number of studies dealing with phonology have focused their attention on phonological production at the expense of phonological perception which provides the foundation stone for phonological production. This study focuses on phonological perception at phonemic level. The purpose of the study is helping beginning learners improve their perception of the English phonemes which are confus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995